Parse entire macsima metadata from OME metadata #349

MSHelm · 2025-12-10T10:04:47Z

Fixes #348

The parser now tries to use the filename stored in the tiff metadata, if it exists. This is more robust, as users typically do not modify these entries.
If this metadata doesnt exist (which is the case for older data), it falls back to the actual filename.

Since the lung dataset used for testing is using an old format, I added a newer data set that is publicy available (https://zenodo.org/records/14008816). I documented this in the tests as well.

Full disclosure: I am working for Miltenyi, but I love your work ;)

codecov-commenter · 2025-12-10T10:06:53Z

Codecov Report

❌ Patch coverage is 11.89189% with 163 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.70%. Comparing base (23bcb72) to head (904abfc).
⚠️ Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
src/spatialdata_io/readers/macsima.py	11.89%	163 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #349      +/-   ##
==========================================
- Coverage   55.07%   52.70%   -2.37%     
==========================================
  Files          26       26              
  Lines        2809     2975     +166     
==========================================
+ Hits         1547     1568      +21     
- Misses       1262     1407     +145

Files with missing lines	Coverage Δ
src/spatialdata_io/readers/macsima.py	`20.82% <11.89%> (-7.76%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

MSHelm · 2025-12-12T15:27:55Z

I checked the tests and the failing ones are all due to some Xenium and Visium HD readers, which I didnt touch XD Did they exist before this PR already?

MSHelm · 2025-12-16T15:10:07Z

Hey everyone,
with the last commits this turned into a bit of a bigger contribution. With this PR, we now solely rely on the metadata stored in the OME object. This should be significantly more robust than parsing file names, and enables us to add more metadata. In principle there would be even more information available than I currently implemented, but I think the most interesting bits are covered here.

I implemented 2 versions of the parser, as there are 2 different formats of the metadata. I removed the previously used data sets, as they are not part of the CI. Instead, I added 2 publicly available data sets, one for each version.

I tried modifying the github action to download them, but it seems that this was not really successful. If someone can give me some pointers that would be wonderful.

Cheers!

LucaMarconato · 2025-12-28T12:19:50Z

Hi @MSHelm, thanks for the contribution! I’ve fixed the unrelated failing tests on main, and your PR is now green.

I had a quick look but before full review I kindly ask you to verify and confirm which items from the contribution guide checklist apply to this PR. Ensuring the contributions guidelines are followed help streamline our review process and improve long-term code maintainability. Much appreciated!

… metadata

MSHelm · 2026-01-06T16:03:51Z

Hey @LucaMarconato

As requested below is the checklist.

Over christmas I also thought about whether we should be conservative or lenient with missing metadata?
The current implementation is a bit wonky, as the parsers themselves just create a dictionary that allows missing metadata. But the actual ChannelMetadata object only allows missing data for clones (for example for DAPI stainings). This can lead to unintuitive errors. Either I would change the ChannelMetadata object to allow missing metadata, except for name and cycle (as in the previous implementation), or be strict and directly return a ChannelMetadata object from the parsers.

Regarding the test data, I will provide some small versions of OMAP10 and OMAP23 via zenodo for the CI. Should then these be used for the unit tests?
It seems to me that the large original data is currently not fetched in the CI, and no tests on macsima are run due to this.

Checklist

Specification
- Provide a link to the public specification/changelog of the raw data format (if no public specification available yet, reach out to scverse privately (Zulip/email)) -> See Zulip message
Reader implementation
- Implement reader functions in src/spatialdata_io/readers/ (if no formal spec available, place under experimental/)
- Place string constants in src/spatialdata_io/_constants/_constants.py
Example data
- Provide (very) small public test dataset(s) (~100kB–10MB, preferred), licensed with a permissive license (e.g. CC BY 4.0)
- Ensure dataset(s) cover edge cases of the format
- In addition to the small dataset(s), consider creating scripts for downloading and converting a real dataset in spatialdata-sandbox:
  - -> See Add OMAP10 and OMAP23 as examples for MACSima v0 and v1 format giovp/spatialdata-sandbox#57
  - download.py (fetch raw data → data/)
  - to_zarr.py (convert raw data → data.zarr)
  - Include requirements.txt (or add PEP 723 metadata to the scripts) for dependencies not included in spatialdata (Not relevant)
- If the data is not available online, upload it to the "SpatialData Submissions" Zenodo community
- We do not support private data, with the exception of data that will soon be made available to the public. In such cases, please reach out to us. Then, please provide a private repo with download.py and to_zarr.py (or upload the data privately to the "SpatialData Submissions" Zenodo community)
Tests
- Add test functions for the reader
- Test multiple versions of the raw format if applicable
- Verify visualization/alignment with spatialdata-plot or napari-spatialdata
- Add proxy tests for spatial extent correctness (spatialdata.get_extent())
- Add proxy tests for coloring/annotations of selected elements/features -> will do once the small data set is done and integrated
- Add tests for auxiliary/helper functions if present
Extra
- Update CLI in src/spatialdata_io/__main__.py if reader/converter is added/modified
- Consider contributing to the public project board (currently view only, please reach out for edits)
- Provide feedback on this contribution guide to improve it further

LucaMarconato · 2026-01-06T16:27:47Z

Thank you for the update and for going through the checklist!

Regarding the test data, I will provide some small versions of OMAP10 and OMAP23 via zenodo for the CI. Should then these be used for the unit tests?
It seems to me that the large original data is currently not fetched in the CI, and no tests on macsima are run due to this.

Yes please, the small datasets (and only the small datasets) should be included in the CI.

Parse cycle from ame in tiff metadata, not actual filename

5c588fc

MSHelm changed the title ~~Parse cycle from ame in tiff metadata, not actual filename~~ Parse macsima cycle from name in tiff metadata, not actual filename Dec 10, 2025

MSHelm added 3 commits December 15, 2025 22:20

Parse additional metadata from filename stored in ome tiff

8b49711

Parse all metadata from ome metadata, nothing from filename

82d304f

Add public OMAP data sets for testing and adjust tests

cef7170

MSHelm changed the title ~~Parse macsima cycle from name in tiff metadata, not actual filename~~ Parse entire macsima metadata from OME metadata Dec 16, 2025

Merge branch 'main' into fix/macsima-cycle-parsing

0aae9ce

Emitt warning when duplicate key with different value is found in OME…

904abfc

… metadata

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parse entire macsima metadata from OME metadata #349

Parse entire macsima metadata from OME metadata #349

MSHelm commented Dec 10, 2025

Uh oh!

codecov-commenter commented Dec 10, 2025 •

edited

Loading

Uh oh!

MSHelm commented Dec 12, 2025

Uh oh!

MSHelm commented Dec 16, 2025

Uh oh!

LucaMarconato commented Dec 28, 2025

Uh oh!

MSHelm commented Jan 6, 2026

Uh oh!

LucaMarconato commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Parse entire macsima metadata from OME metadata #349

Are you sure you want to change the base?

Parse entire macsima metadata from OME metadata #349

Conversation

MSHelm commented Dec 10, 2025

Uh oh!

codecov-commenter commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

MSHelm commented Dec 12, 2025

Uh oh!

MSHelm commented Dec 16, 2025

Uh oh!

LucaMarconato commented Dec 28, 2025

Uh oh!

MSHelm commented Jan 6, 2026

Checklist

Uh oh!

LucaMarconato commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Dec 10, 2025 •

edited

Loading